Internet Movie Database (IMDb) provides various information about movies, such as total budgets, lengths, actors, and user ratings. They are publicly available from here. In this lab, let's explore a processed dataset named 'imdb.csv', which contains some basic information of movies.
Download the file from Canvas. There are 4 columns separated by tab:
First, we want to get some insights from the data with Python; then we want to display information on a web page and prettify it with html/css.
Things to note:
To do this, we first need to read the CSV file. Python provides the csv module to read and write CSV files. The csv.reader function returns a Python object which will iterate over lines in the given file. Each line is returned as a list of strings, so that we can access a particular column using list index. If we want to ignore the first line, we can use islice. It is like slicing a list, but it can slice an iterator (e.g. file stream). For instance, islice(reader, 0, 5) means "give me the first 5 items from the reader". islice(reader, 1, 5) means "give me the 4 items starting from the second item".
A basic usage example to read the first 11 lines of 'imdb.csv':
In [20]:
import csv
from itertools import islice
f = open('imdb.csv', 'r')
reader = csv.reader(f, delimiter='\t')
for row in islice(reader, 0, 5):
print(row)
print(row[1])
There are many ways to do Q1. One way is to use dictionaries where the key: value pairs are:
In [2]:
dt = {}
year = 1972
if year not in dt:
dt[year] = 1
else:
dt[year] += 1
print(dt)
Python automates the job above by using Counter.
In [3]:
from collections import Counter
movie_counter = Counter()
movie_counter[1972] +=1
print(movie_counter[1972])
print(movie_counter[1970])
Once all lines are read, we want to print the dictionary, which can be done by iterating its key: value pairs.
In [4]:
for key,val in dt.items():
print(key,val)
for key,val in movie_counter.items():
print(key,val)
You can get the keys (the years) by using .keys() function.
In [5]:
movie_counter[1980] += 5
movie_counter[2015] += 1
movie_counter.keys()
Out[5]:
In [6]:
alist = [23,3,5,4,2,1,1,0,1000]
print(min(alist))
print(max(alist))
Code for Q1
In [30]:
import pandas as pd
imdb = pd.read_csv('imdb.csv', delimiter='\t')
In [31]:
imdb.head()
Out[31]:
In [34]:
min(imdb['Year'])
Out[34]:
In [35]:
max(imdb['Year'])
Out[35]:
In [48]:
from collections import Counter
Counter(imdb["Year"])
Out[48]:
We can store the ratings/votes column as a list and then calculate various basic statistics (mean, median, etc.). To do this, we can use the NumPy library and call the function numpy.mean and numpy.median. For example,
In [10]:
import numpy as np
alist = [1,3,6,2,5,2]
print(np.mean(alist))
print(np.median(alist))
Code for Q2
In [41]:
# implement below
imdb['Rating'].mean()
Out[41]:
In [42]:
imdb['Votes'].mean()
Out[42]:
Store the movie titles and ratings information as a dictonary:
Then, we can sort the dictionary based on its values, which will return a list of tuples. Note to print only the top 5 movies.
In [12]:
import operator
dt = {1971: 2, 1975: 10, 1962: 1, 1980: 50, 1981: 55}
sorted_x_by_val = sorted(dt.items(), key=operator.itemgetter(1), reverse=True )
print(sorted_x_by_val)
for elem in sorted_x_by_val:
print(elem[0],elem[1])
Code for Q3
In [45]:
# implement below
import warnings
warnings.filterwarnings('ignore')
imdb.sort_index(by=['Rating'], ascending=[False]).head()
Out[45]:
In [47]:
imdb.sort_index(by=['Votes'], ascending=[False]).head()
Out[47]:
Many browsers don't allow loading files locally due to security concerns. We can get around by creating a local web server with Python by the following:
If successful, you'll see
Serving HTTP on 0.0.0.0 port 8000 …
This means that now your computer is running a webserver and its IP address is 0.0.0.0 and the port is 8000. Now you can open a browser and type "0.0.0.0:8000" on the address bar to connect to this webserver. Equivalently, you can type "localhost:8000". After typing, click on the different links. You can directly access one of these links by typing in ‘localhost:8000/NAME_OF_YOUR_FILE.html’ in the address bar.
Webpages are written in a standard markup language called HTML (HyperText Markup Language). The basic syntax of HTML consists of elements enclosed within ‘<’ and ‘>’ symbols. Browsers such as Firefox and Chrome parse these tags and display the content of a webpage in the designated format. This is called rendering.
Here is a list of important tags and their descriptions.
Use the top 5 voted movies found in the first part, try the following:
Test your code by visiting the web page on your local server. Name the .html file with file name 'lab02_html_lastname_firstname', and upload to Canvas.
While HTML directly deals with the content and structure, CSS (Cascading Style Sheets) is the primary language that is used for the look and formatting of a web document.
A CSS stylesheet consists of one or more selectors, properties and values. For example:
body {
background-color: white;
color: steelblue;
}
Selectors are the HTML elements to which the specific styles (combination of properties and values) will be applied. In the above example, all text within the ‘body’ tags will be in steelblue.
There are three ways to include CSS code in HTML. This is called ‘referencing’.
Embed CSS in HTML - You can place the CSS code within ‘style’ tags inside the ‘head’ tags. This way you can keep everything within a single HTML file but does make the code lengthy.
<head>
<style type="text/css">
.description {
font: 16px times-new-roman;
}
.viz {
font: 10px sans-serif;
}
</style>
</head>
Reference an external stylesheet from HTML - This is a much cleaner way but results in the creation of another file. To do this, you can copy the CSS code into a text file and save it as a ‘.css’ file in the same folder as the HTML file. In the document head in the HTML code, you can then do the following:
<head>
<link rel=”stylesheet” href=”stylesheet.css”>
</head>
Attach inline styles - You can also directly attach the styles in-line along with the main HTML code in the body. This makes it easy to customize specific elements but makes the code very messy - the design and content get mixed up.
<p style=”color: green; font-size:36px; font-weight:bold;”>
Inline styles can help when using D3.
</p>
Can you redo questions 3-5 in the previous section with only css? Name the .ipynb file with file name 'lab02_css_lastname_firstname', and upload to Canvas.
In [ ]: